AITopics | diagonal linear network

Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic dynamics are globally well posed and converge exponentially fast to zero risk with high probability, yielding a fully explicit non-asymptotic description of their long-time behavior. Numerical simulations corroborate our theoretical findings.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

2605.17177

Country: North America > United States > New York (0.27)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

17a9ab4190289f0e1504bbb98d1d111a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 08:29:00 GMT

artificial intelligence, iterate, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Inductive biases of multi-task learning and finetuning: multiple regimes of feature reuse

Neural Information Processing SystemsFeb-18-2026, 07:44:09 GMT

Neural networks are often trained on multiple tasks, either simultaneously (multi-task learning, MTL) or sequentially (pretraining and subsequent finetuning, PT+FT). In particular, it is common practice to pretrain neural networks on a large auxiliary task before finetuning on a downstream task with fewer samples. Despite the prevalence of this approach, the inductive biases that arise from learning multiple tasks are poorly characterized. In this work, we address this gap.

artificial intelligence, machine learning, relu network, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

b5b528767aa35f5b1a60fe0aaeca0563-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 16:48:19 GMT

artificial intelligence, linear network, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network

Neural Information Processing SystemsFeb-16-2026, 16:48:15 GMT

Unfortunately, even for standard linear networks in regression setting, a comprehensive characterization of the implicit bias is still an open problem.

artificial intelligence, linear network, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability

Neural Information Processing SystemsFeb-12-2026, 10:13:19 GMT

Currently, most theoretical works on implicit regularisation have primarily focused on continuous time approximations of (S)GD where the impact of crucial hyperparameters such as the stepsize and the minibatch size are ignored. One such common simplification is to analyse gradient flow, which is a continuous time limit of GD and minibatch SGD with an infinitesimal stepsize. By definition, this analysis does not capture the effect of stepsize or stochasticity.

artificial intelligence, machine learning, stepsize, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

5da6ce80e97671b70c01a2e703b868b3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 10:13:16 GMT

artificial intelligence, machine learning, stepsize, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Saddle-to-Saddle Dynamics in Diagonal Linear Networks

Neural Information Processing SystemsFeb-8-2026, 08:35:12 GMT

The main result is informally presented here.

artificial intelligence, iterate, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

(S)GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

Neural Information Processing SystemsDec-25-2025, 13:25:36 GMT

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2$-layer diagonal linear networks. We prove the convergence of GD and SGD with macroscopic stepsizes in an overparametrised regression setting and characterise their solutions through an implicit regularisation problem. Our crisp characterisation leads to qualitative insights about the impact of stochasticity and stepsizes on the recovered solution. Specifically, we show that large stepsizes consistently benefit SGD for sparse regression problems, while they can hinder the recovery of sparse solutions for GD. These effects are magnified for stepsizes in a tight window just below the divergence threshold, in the ``edge of stability'' regime. Our findings are supported by experimental results.

diagonal linear network, implicit bias, stepsize and edge, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

Neural Information Processing SystemsDec-25-2025, 06:31:33 GMT

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow.Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias. To fully complete our analysis, we provide convergence guarantees for the dynamics. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and they help explain the greater performances of stochastic gradient descent over gradient descent observed in practice.

diagonal linear network, implicit bias, provable benefit, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

Collaborating Authors

diagonal linear network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

High-dimensional Limit of SGD for Diagonal Linear Networks

17a9ab4190289f0e1504bbb98d1d111a-Paper-Conference.pdf

Inductive biases of multi-task learning and finetuning: multiple regimes of feature reuse

b5b528767aa35f5b1a60fe0aaeca0563-Supplemental-Conference.pdf

Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network

S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability

5da6ce80e97671b70c01a2e703b868b3-Paper-Conference.pdf

Saddle-to-Saddle Dynamics in Diagonal Linear Networks

(S)GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity